Skip to main content

About Data Lineage

Data lineage is the process of tracking the origin, movement, transformation, and usage of data across an organization's ecosystem. It helps teams understand where data comes from, how it flows through systems, and how it changes over time.

Understanding data lineage is crucial for several reasons:

  • Source-to-Destination tracking: Maps data from its raw source (e.g., database, API) to its final destination (e.g., reports, dashboards).
  • Transformation history: Shows how data is modified through ETL/ELT pipelines, SQL queries, etc.
  • Dependencies & relationships: Helps identify upstream and downstream dependencies, ensuring smooth data pipeline maintenance.
  • Troubleshooting and debugging: When data quality issues arise, lineage helps pinpoint the source of the problem by tracing the data back to its origin and through all the transformations it underwent.
  • Impact analysis: Before making changes to data sources or pipelines, lineage allows you to understand the potential downstream impact on other dependent assets like workbooks and, in the future, BI dashboards. This helps prevent unintended consequences.
  • Compliance and auditing: Many regulations require organizations to have a clear understanding of their data flows for compliance and audit purposes. Data lineage provides the necessary transparency.
  • Data governance: By visualizing data flows, lineage facilitates better data governance by providing a clear overview of how data is being used and managed across the platform.
  • Building trust in data: When users can see the origin and transformations applied to their data, it increases their trust and confidence in the accuracy and reliability of the information.

Currently, DataGOL provides data lineage for the following:

  • Pipelines: Shows the flow of data from data sources through the pipeline's transformations to the destination.
  • Data Sources: Displays all the pipelines and workbooks that utilize a specific data source, illustrating its usage across the platform.
  • Workbooks: Visualizes the upstream data sources and any intermediate materialized views (which are a type of workbook created from published queries) that contribute to the workbook's data.